Haiku: interactive comprehensible data mining
نویسندگان
چکیده
This paper discusses a novel data mining system devised and developed at Birmingham, which attempts to provide a more effective data mining system. It ties in with the workshop goals in a number of different areas. The system tries to allow users to see the big picture and not get caught up in irrelevant detail too early; it uses perception-based approaches to data mining, and it uses interactive visualisation techniques to assist in this process. BIG QUESTIONS In data mining, or knowledge discovery, we are essentially faced with a mass of data that we are trying to make sense of. We are looking for something "interesting". Quite what "interesting" is is hard to define, however one day it is the general trend that most of the data follows that we are intrigued by the next it is why there are a few outliers to that trend. In order for a data mining to be generically useful to us, it must therefore have some way in which we can indicate what is interesting and what is not, and for that to be dynamic and changeable. The second issue to address is that, once we can ask the question appropriately, we need to be able to understand the answers that the system gives us. It is therefore important that the responses of the system are represented in ways that we can understand. Thirdly, we should recognise the relative strengths of users and computers. The human visual system is exceptionally good at clustering, at recognising patterns and trends, even in the presence of noise and distortion. Computer systems are exceptionally good at crunching numbers, producing exact parameterisations and exploring large numbers of alternatives. An ideal data mining system should, we would argue, offer the above characteristics and use the best features of both the user and the computer in producing its answers. This leads us towards a system that will be interactive, in order to be flexible and work towards a solution. It should use visualisation techniques to offer the user the opportunity to do both perceptual clustering and trend analysis, and to offer a mechanism for feeding back the results of machinebased data mining. It should have a data mining engine that is powerful, effective, and which can produce humanlycomprehensible results as well. The Haiku system was developed with these principles in mind, and offers a symbiotic system that couples interactive 3-d dynamic visualisation technology with a novel genetic algorithm. VISUALISATION The visualisation engine used in the Haiku system provides an abstract 3-d perspective of multi-dimensional data. The visualisation consists of nodes and links, whose properties are given by the parameters of the data. Data elements affect parameters such as node size, mass, link strength and elasticity, and so on. Multiple elements can affect one parameter, or a subset of parameters can be chosen. Nodes are scattered randomly into the 3d space, with their associated links. This 3d space has obeys a set of physicaltype laws, which affect this initial arrangement. Links tend to want to assume a particular length, and tend to pull inwards until they reach that length, or push outwards if they are compressed, just as a spring does in the real world. Nodes tend to repel each other, based on their mass. This can be seen as a force directed graph visualisation. The physics of the space are adjustable, but are chosen so that a steady state solution can be reached that is static this is unlike the real world, in which a steady state exists that involves motion, with one body orbiting another. This initial state is then allowed to evolve, and the links and nodes shuffle themselves around until they reach a local minimum, low energy steady state. This is then static, and can be explored at will by rotating it, zooming in and flying through and around it. It is a completely abstract representation of the data, and so has no preconceptions built in. Different data to attribute mappings will clearly give different structures, but the system can at least produce a view of more than 3 dimensions of the raw data at once.
منابع مشابه
Evolutionary Approaches to Visualisation and Knowledge Discovery
Haiku is a data mining system which combines the best properties of human and machine discovery. An self organising visualisation system is coupled with a genetic algorithm to provide an interactive, flexible system. Visualisation of data allows the human visual system to identify areas of interest, such as clusters, outliers or trends. A genetic algorithm based machine learning algorithm can t...
متن کاملHitch Haiku: An Interactive Supporting System for Composing Haiku Poem
Human communication is fostered in environments of regional communities and cultures and in different languages. Cultures are rooted in their unique histories. Communication media have been developed to circulate these cultural characteristics. The theme of our research is “Cultural Computing”, which means the translation of cultures using scientific methods representing essential aspects of Ja...
متن کاملHaiku Generator that Reads Blogs and Illustrates Them with Sounds and Images
In this paper we introduce our haiku generator, which, in contrast to other systems, is not restricted to limited classic vocabulary sets and preserves a classic style without becoming too random and abstract because it performs a semantic integrity check using the Internet. Moreover, it is able to analyze blog entry input and, by using nouns and adjectives for web-mining, to stay on topic and ...
متن کاملIntuitive Storytelling Interaction: ZENetic Computer
We tried to develop an interactive system that could help us recreate our conscious selves by calling on Buddhist principles, Asian philosophy, and traditional Japanese culture through the inspirational media of ink painting, kimono and haiku. “Recreating our selves” means the process of making the consciousness of our ‘daily self’ meet that of our ‘hidden self’ through rediscovering creative r...
متن کامل